A multi-pass, dynamic-vocabulary approach to real-time, large-vocabulary speech recognition
نویسنده
چکیده
We present a multi-pass approach to real-time, largevocabulary speech recognition in which we dynamically manipulate the vocabulary between passes. For recognition tasks where subsets of the vocabulary can be triggered by the occurences of other words or phrases, a combination of unknown word modelling and vocabulary refinement can be utilized to attack large-vocabulary tasks with relatively small active vocabularies. We evaluate this approach within the JUPITER weather information domain by enabling recognition of all 30,000 citystate pairs within the USA. By maximally precompiling the static and dynamic portions of our search space using finitestate transducers (FSTs), we splice dynamic-vocabulary components on-demand during decoding with negligible speed impact while enforcing cross-word context-dependent constraints. We find that a dynamic-vocabulary system can compete quite favorably with a single-pass, large-vocabulary system. For even larger vocabularies (e.g., street addresses), static compilation may be infeasible, making a dynamic-vocabulary approach nec-
منابع مشابه
Two-pass Algorithm for Large Vocabulary Continuous Speech Recognition
This paper presents a two-pass algorithm for Extra Large (more than 1M words) Vocabulary COntinuous Speech recognition based on the Information Retrieval (ELVIRCOS). The principle of this approach is to decompose a recognition process into two passes where the first pass builds the word subset for the second pass recognition by using information retrieval procedure. Word graph composition for c...
متن کاملMulti-pass ASR using vocabulary expansion
Current automatic speech recognition (ASR) systems have to limit their vocabulary size depending on available memory size, expected processing time, and available text data for building a vocabulary and a language model. Although the vocabularies of ASR systems are designed to achieve high coverage for the expected input data, it cannot be avoided that input data includes out-of-vocabulary (OOV...
متن کاملExtra large vocabulary continuous speech recognition algorithm based on information retrieval
This paper presents a new two-pass algorithm for Extra Large (more than 1M words) Vocabulary COntinuous Speech recognition based on the Information Retrieval (ELVIRCOS). The principle of this approach is to decompose a recognition process into two passes where the first pass builds the words subset for the second pass recognition by using information retrieval procedure. Word graph composition ...
متن کاملLarge vocabulary Speech Recognition System: SPOJUS++
In this paper, we describe Large vocabulary Continuous Speech Recognition (LVCSR) system SPOJUS++ which has been developed in our laboratory for over 20 years and recently fully reimplemented from scratch. SPOJUS++ employs a context-dependent Hidden Markov Model (HMM) as an acoustic model and an N-gram model as a language model to decode speech. SPOJUS++ has many novel features including a dyna...
متن کاملSpoken Term Detection for Persian News of Islamic Republic of Iran Broadcasting
Islamic Republic of Iran Broadcasting (IRIB) as one of the biggest broadcasting organizations, produces thousands of hours of media content daily. Accordingly, the IRIBchr('39')s archive is one of the richest archives in Iran containing a huge amount of multimedia data. Monitoring this massive volume of data, and brows and retrieval of this archive is one of the key issues for this broadcasting...
متن کامل